首页> 外文OA文献 >Bayes Networks for Supporting Query Processing Over Incomplete Autonomous Databases
【2h】

Bayes Networks for Supporting Query Processing Over Incomplete Autonomous Databases

机译:贝叶斯网络支持不完整的查询处理   自治数据库

摘要

As the information available to lay users through autonomous data sourcescontinues to increase, mediators become important to ensure that the wealth ofinformation available is tapped effectively. A key challenge that theseinformation mediators need to handle is the varying levels of incompleteness inthe underlying databases in terms of missing attribute values. Existingapproaches such as QPIAD aim to mine and use Approximate FunctionalDependencies (AFDs) to predict and retrieve relevant incomplete tuples. Theseapproaches make independence assumptions about missing values---whichcritically hobbles their performance when there are tuples containing missingvalues for multiple correlated attributes. In this paper, we present aprincipled probabilistic alternative that views an incomplete tuple as defininga distribution over the complete tuples that it stands for. We learn thisdistribution in terms of Bayes networks. Our approach involvesmining/"learning" Bayes networks from a sample of the database, and using it todo both imputation (predict a missing value) and query rewriting (retrieverelevant results with incompleteness on the query-constrained attributes, whenthe data sources are autonomous). We present empirical studies to demonstratethat (i) at higher levels of incompleteness, when multiple attribute values aremissing, Bayes networks do provide a significantly higher classificationaccuracy and (ii) the relevant possible answers retrieved by the queriesreformulated using Bayes networks provide higher precision and recall than AFDswhile keeping query processing costs manageable.
机译:随着通过自主数据源可用于非专业用户的信息不断增加,中介对于确保有效利用可用信息的丰富性变得至关重要。这些信息介体需要处理的关键挑战是,就缺少的属性值而言,底层数据库的不完整程度各不相同。 QPIAD之类的现有方法旨在挖掘和使用近似功能依赖关系(AFD)来预测和检索相关的不完整元组。这些方法针对缺失值做出独立性假设-当存在包含多个相关属性的缺失值的元组时,这将严重阻碍其性能。在本文中,我们提出了一种原则上的概率替代方案,该方案将不完整的元组视为定义了它所代表的完整元组上的分布。我们根据贝叶斯网络了解这种分布。我们的方法涉及从数据库样本中挖掘/“学习”贝叶斯网络,并使用它来进行插补(预测缺失值)和查询重写(当数据源是自主的时,查询受限属性的检索结果不完整)。我们提供的经验研究证明(i)在较高的不完整性级别上,当缺少多个属性值时,贝叶斯网络的确提供了明显更高的分类准确性,并且(ii)使用贝叶斯网络重新构造的查询所检索的相关可能答案提供了更高的准确性和查全率AFD同时保持查询处理成本可管理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号